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Abstract 

Antibody-based therapeutics provides novel and efficacious treatments for a number of diseases. Traditional experimental 
approaches for designing therapeutic antibodies rely on raising antibodies against a target antigen in an immunized animal 
or directed evolution of antibodies with low affinity for the desired antigen. However, these methods remain time 
consuming, cannot target a specific epitope and do not lead to broad design principles informing other studies. 
Computational design methods can overcome some of these limitations by using biophysics models to rationally select 
antibody parts that maximize affinity for a target antigen epitope. This has been addressed to some extend by OptCDR for 
the design of complementary determining regions. Here, we extend this earlier contribution by addressing the de novo 
design of a model of the entire antibody variable region against a given antigen epitope while safeguarding for 
immunogenicity (Optimal Method for Antibody Variable region Engineering, OptMAVEn). OptMAVEn simulates in silico the 
in vivo steps of antibody generation and evolution, and is capable of capturing the critical structural features responsible for 
affinity maturation of antibodies. In addition, a humanization procedure was developed and incorporated into OptMAVEn 
to minimize the potential immunogenicity of the designed antibody models. As case studies, OptMAVEn was applied to 
design models of neutralizing antibodies targeting influenza hemagglutinin and HIV gp120. For both HA and gp120, novel 
computational antibody models with numerous interactions with their target epitopes were generated. The observed rates 
of mutations and types of amino acid changes during in silico affinity maturation are consistent with what has been 
observed during in vivo affinity maturation. The results demonstrate that OptMAVEn can efficiently generate diverse 
computational antibody models with both optimized binding affinity to antigens and reduced immunogenicity. 
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Introduction 

Therapeutic antibodies are widely recognized to be among the 
most promising agents to treat various diseases, including cancers, 
immune disorders, and infections [1,2]. The earliest used 
technology for the generation of therapeutic antibodies is raising 
antibodies against a target antigen in immunized mice. Although 
widely utilized, the low clinical success rate using mouse antibodies 
reflects that these foreign proteins can be highly immunogenic in 
humans, and they typically have weak interactions with human 
complement and antibody receptors, resulting in inefficient 
effector functions [3], These limitations have largely been 
overcome by grafting the variable domains of a mouse monoclonal 
antibody to the constant domains of a human antibody, a process 
known as chimerization [4,5]. Although chimeric antibodies are 
more human-like and induce considerably less response by the 
human immune system, they are still not completely human. More 
recently, complete human antibodies have been designed using 
directed evolution techniques [6,7] that mimic the natural 
selection of the process to evolve antibodies towards a desired 



property. Among them, phage display [8,9], a technique based on 
the presentation of peptides or protein fragments on the surface of 
bacteriophages, is most widely used and offers robust and 
complementary routes to the generation of potent human 
antibodies. Despite these advances in the design of antibodies, 
current experimental methods still have considerable limitations 
and cannot: (1) target a specific antigen epitope, (2) provide 
universally applicable structural design routes, and (3) rationally 
engineer mutations with significantly reduced immunogenicity. 

By contrast, computational methods could efficiently overcome 
some of these shortcomings. For example, a number of successful 
applications of computational methods have been reported in 
antibody-antigen recognition [10-13], antibody structure and 
stability prediction [14—17], design of mutations and antibody- 
antigen interface [16-22], and immunogenicity prediction [23,24]. 
However, most of the current examples of computational antibody 
design have been largely limited to existing antigen-antibody 
complex structures (i.e. re-designs of antigen-antibody interfaces), 
and the de novo design of antibodies to target a pre-selected 
antigen epitope has remained elusive. 
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To address the limitations of current platforms for antibody 
design, we have developed the OptCDR method that can de novo 
design an antibody paratope model against any targeted antigen 
epitope by modeling and optimizing the complementarity 
determining regions (CDRs) [21]. However, CDRs only capture 
part of the binding capacity of an antibody and were not 
constrained to fully human designs. Therefore, in this paper we 
take the next step and introduce a new computational framework 
named OptMAVEn for de novo design of not just the CDRs, but 
fully human, complete antibody variable domain models by 
expanding the concepts pioneered in OptCDR. OptMAVEn 
designs antibody models by mimicking the natural evolution of 
antibody generation and affinity maturation (Fig. SI). In 
particular, it is implemented as a three-step workflow (Fig. 1). 
First, for a given antigen, an ensemble of possible antigen binding 
conformations is generated in a virtual antibody-binding site. This 
site is defined as a rectangular box that covers all the geometry 
centers of 750 antigen epitopes with known structures (Fig. 2A- 
2D). Second, the best scored antigen conformation and combina- 
tion of six modular antibody parts from the Modular Antibody 
Parts (MAPs) database [25] are selected using a mixed-integer 
linear programming (MILP) formulation and the initial antibody 
structure is predicted by assembling the six MAPs. Third, the 



antibody model is redesigned and optimized through our Iterative 
Protein Redesign & Optimization (IPRO) protocol [26]. 

A new aspect in the last step of OptMAVEn is that we 
incorporated a 9-mer-sequence-based humanization algorithm to 
reduce the potent immunogenicity of designed antibody models 
while optimizing their binding affinity to an antigen. The 
immunogenicity of an antibody is triggered by molecular 
recognition of its immunogenic peptides by the major histocom- 
patibility complex (MHC) and/or T cells [27]. A variety of 
humanization methods such as CDR grafting [28], resurfacing 
[29], superhumanization [30], framework shuffling [31] and 
guided selection [32] have produced antibodies more homologous 
to human sequences, often but not always leading to reductions in 
immunogenicity to clinically acceptable levels [33]. Our approach 
is inspired by a humanness score termed "human string content" 
(HSC) [23] that quantifies a sequence at the level of potential 
MHC/T cell epitopes. HSC is principally based on the 
assumption that having more human sequence present in the 
antibody will reduce its immunogenic potential [34—36]. 

OptMAVEn aims at designing a diverse library of antibody 
models that is simultaneously co-optimized on both binding 
affinity to an antigen epitope and immunogenicity. The resulting 
computational antibody models are de novo designs, probably 
never seen in nature before, generated de novo through 
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Figure 1. OptMAVEn workflow. Step 1: Antigen positioning. Step 2: Assembly of antibody models targeting the antigen epitope. Step 3: 
Affinity maturation and humanization of antibody models by redesigning. 
doi:1 0.1 371 /journal.pone.01 05954.g001 
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Figure 2. Illustrations of antibody-binding site and the algorithm of antigen positioning. H and L chains are colored in cyan and 
green, respectively; epitope is colored in magenta. (A) Database of 750 antibody-antigen structures. H and L chains are colored in cyan and 
green. Antigens are in different colors. (B) All the complex structures superimposed onto a reference antibody structure whose coordinate center of 
CDRs attachment points was placed on the origin. (C) A rectangular box that covers all the mean epitope coordinates. Figure S2 shows the 
distributions for the mean coordinates of all the epitopes along the X, Y, and Z axes. (D) The virtual antibody-binding site. (E) An antigen initial 
conformation. Epitope is colored in magenta. (F) The rotated antigen conformation having the most negative epitope coordinates. (G) A positioned 
antigen conformation with epitope's geometry center at one grid point. (H) A rotated antigen conformation around Z axis. 
doi:1 0.1 371 /journal.pone.01 05954.g002 



computations. In this study, we benchmarked the corresponding 
protocols in each step individually and show that OptMAVEn 
could: (1) efficiendy position antigens in the antibody-binding site 
with a high success rate of 96% using a benchmark set of 120 
experimentally determined antigen-antibody complexes (2) redis- 
cover 57.5% of native antibody parts using MILP based 
optimization for the same benchmark set (3) recapitulate known 
interactions responsible for affinity maturation using two known 
germline and affinity maturation antibody pairs as the benchmark 
set and (4) unambiguously distinguish human antibody sequences 
from other species (P value <0. 000001). As case studies, we 
applied OptMAVEn to design broadly neutralizing antibody 
models against two antigens: envelope glycoprotein gpl20 (gpl20) 
and hemagglutinin (HA), which are well-known antigens for the 
HIV-1 and influenza viruses, respectively. The presented designs 
are diverse in sequence and structure spanning a wide array of 
affinity maximization interactions. These designed interactions 
(partially) mimic the geometry of their natural receptors while 
maintaining the computational humanness of the sequences. 
Overall, our results demonstrate that OptMAVEn is a computa- 
tional framework for an open challenge that could make significant 



contribution to the development of a new generation of 
therapeutic antibodies and vaccines. 

Method 

Step 1: Antigen positioning 

Antigen positioning starts with the definition of a general 
antibody-binding site (Fig. 2A-2D), which is represented by a 
rectangle grid box located close to the origin. This box was 
obtained by analyzing the locations of known antigen epitopes 
from a precompiled antibody-antigen crystal structure database 
consisting of 750 antibody-antigen complexes and sharing three 
common features: (1) X-ray resolution is better than 2.5 A, (2) 
both heavy and light chains are available in the structures, and (3) 
an antigen is included for each structure. Among the 750 
structures, there are 214 hapten, 109 peptide and 427 protein 
binders. The size of the box was adjusted to include all the mean 
coordinates of atoms of antigen epitopes, and its X, Y and Z values 
were within the ranges of (— 10 A, 5 A), (—5 A, 10 A) and (3.75 A, 
16.25 A), respectively (Fig. 2D). The box was divided into a set of 
grid points by assigning grid spacing, which is user-defined (default 
of 2.5, 2.5 and 1.25 A for X, Y and Z axis, respectively). During 
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the positioning, the coordinate center of an antigen epitope was 
placed into its corresponding grid box (Fig. 2G) and rotated along 
the Z coordinate for 360° with an interval of 60° (Fig. 2H) to 
generate an ensemble of initial antigen positions. Prior to 
positioning, the antigen was rotated so that the epitope had the 
most negative Z coordinates (Fig. 2E and 2F), thus ensuring that 
the target epitope forms the maximum number of interactions 
with the CDRs. In total, 3234 (7x7x11x6) initial positions of 
antigens were generated for this study. 



^X Up =H d , Vie{HV,HCDR3,HJ} (3) 

p=i 



N t Nj Nj 

%kv,p = ^2 Xkcdiq, p = ^2 Xkj, p (4) 
P =\ p=\ P =i 



Step 2: Assembly of antibody models targeting the 
antigen epitope 

At each position, the interaction energies (IEs), including van 
der Waals and electrostatic terms, between the antigen and all 
MAPs were calculated and stored. To avoid detrimental clashes, a 
"softening" atom van der Waals radius whose value is half of that 
from the CHARMM force field [37] was used to estimate the 
hydrophobic interaction. The modular antibody parts were 
previously constructed in the spirit of template-based modeling, 
with each part being a prototype structure of the random variable 
(V), diversity (D), and joining (J) regions in the MAPs database 
[25]. The recently generated database contains 929 parts 
constructed from an analysis of 1168 human, humanized, 
chimeric, and mouse antibody structures and encompasses all 
currently observed structural diversity of antibodies. Table SI 
shows the numbers of antibody parts from the MAPs database. V, 
CDR3 and J structures can assemble both H and L chains of an 
antibody. There exist two types of light chain, KAPPA and 
LAMBDA, which are treated separately. Each MAPs structure 
was numbered using FMGT's unique numbering [38-41] and 
consistently placed in the 3D space so that its CDRs attachment 
points were approximately on the same X-Y plane and centered 
on the origin with CDRs perpendicularly directed in the positive Z 
direction. 

Once the IEs are calculated, the problem of selecting the best 
scoring combination of non-clashing antibody parts at each position 
could be mathematically represented using an MILP representation. 
This requires the definition of the index set / 
/ = {( \HV, HCDR3, HJ, KV, KCDR3, KJ, LV , LCDKi, LJ}, 
denoting modular antibody parts. H denotes Heavy, K for KAPPA 
and L for LAMBDA. V denotes Variable region, CDR3 for Diversity 
region and J for the Joining region. Also required is the set 
P={p 1 1,2, . . . ,Ni}, which denotes the number of MAPs structures 
at position i. Set TP**"* contains all pairwise MAPs combinations (i 1 , 
p\) and (i2, p2) that sterically clash with one another. The binary 
variables Xf p denotes if antibody part p is selected at position i 
(Xjj = 1) or not (Xij = 0). Parameter E\ encodes the calculated 
energy between structure p at position i and the antigen substrate. 
Switching parameters H d and L d have values of one if a VH or VL 
respectively, are being designed and zero if they are not chosen. This 
allows the user to possibly design only a VH or VL, as would be 
appropriate for a nanobody. The MILP formulation can then be 
written as follows: 

9 Nj 

Minimize ^ ^ x Up E \ >p ( 1 ) 



X, Xpl +X i2 , p2 <l, V(ilj>Ui2*2yeIF* ah (2) 



N t Ni Ni 

%LV,p = ^2 XLCDR3,p = ^2 ^LJ,p (5) 

p=l p=l p=l 



Ni Ni 

y Xkv, p +^2x LVtP =L d (6) 
p=\ p=\ 

Equation 1, the objective function, entails the minimization of 
the interaction energy between the antigen and the selected MAPs. 
The inequality in Equation 2 precludes the simultaneous presence 
of two antibody part structures that sterically clash. Equation 3 
guarantees exactly one part is selected for each heavy chain region 
when a VH is designed. OptMAVEn defines both KAPPA and 
LAMBDA light chains and Equations 4—6 make sure that when a 
VL is being designed, it is composed of parts entirely from one 
domain type or the other. Equation 4 requires that the same 
number of KAPPA V, CDR3, and J structures are selected and 
Equation 5 does the same for the LAMBDA structures. Equation 6 
then selects either a KAPPA or LAMBDA V part when a VL 
domain is being designed, and Equations 4 and 5 together ensure 
the selection of the proper type of CDR3 and J structures. The 
optimization formulation described collectively by Equations (1-6) 
can be solved using CPLEX [42] called directly from Python. 

After the MILP selection, the lowest energy solutions (10 in our 
current study) were selected and submitted to local refinement (i.e., 
500 iterations) by randomly moving antigens and reselecting the 
optimal MAPs. The goal of the refinement is to explore the local 
conformational space for the antigen and to energetically rescreen 
the best MAPs. In each iteration, the antigen was randomly 
translated and rotated for a small distance and angles whose value 
was generated using a Gaussian distribution centered at zero with 
a standard deviation of 0.5 A and 5°, respectively. Subsequently, 
the IEs between the antigen at this position and the entire MAPs 
database were reevaluated. A simulated annealing algorithm was 
used to determine whether or not to retain the results of the 
iteration. 

Step 3: Computational affinity maturation and 
humanization by redesigning 

The refined antibody models were redesigned with IPRO [26] 
in order to find sequences that maximally improve the binding 
affinity and possess minimal computational immunogenicity. The 
standard IPRO design protocol was modified for use in 
OptMAVEn, which consists of five main steps: sequence design, 
backbone perturbation, optimal rotamer selection, antigen re- 
docking and energy evaluation. This sequence is carried out 
iteratively (default max of 10,000 iterations). 

Sequence design. For each round, a set of 1-3 continuous 
residues in either VH or VL is randomly selected for mutation. 
Residues from CDRs are given 3-fold higher preference over those 
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from frameworks (FRs) because more mutations are expected in 
the binding site [25] . A selected residue is allowed to mutate to a 
set of permitted amino acid types that is site-specific and could be 
determined according to sequence alignments of relevant 
antibodies. In this study, the alignments of broadly neutralizing 
HIV and influenza antibody sequences were used (Data SI and 
S2). 

At the same time, during the sequence design, a sequence-based 
humanization algorithm is introduced to guarantee that designed 
sequences are human. This algorithm starts with constructing a 
human antibody 9-mer sequences database by splitting 69,032 
human antibody sequences (collected from Abysis [43], IMGT 
[38] and the PDB database [44]) into 1,309,657 unique human 9- 
mers. For a given test sequence, all the possible 9-mer sequence 
frames are extracted and the minimum number of mutations 
between each frame and those from the human 9-mer-sequence 
database is counted. All frame scores are totaled and defined as the 
humanness score (HScore, Equation 7). 

N 

HScore(S)=Y J M f a ( 7 ) 

S denotes a given antibody sequence, N is the total number of 
all the possible 9-mers-fragment sequences, and M is the minimum 
count of the mutations between a 9-mer sequence and those from 
the human 9-mer fragment database. During each iteration only 
sequences with HScore equal to or lower than the previous 
iteration are retained for further optimization. 

Backbone perturbation. Once the sequence is designed, the 
perturbed region, which includes 5 more residues on both sides of 
the design positions and surrounding residues within 4.5 A, is 
subjected to backbone perturbation. For each one of the perturbed 
residues, the (p and \|/ dihedral angles are randomly perturbed 
using a Gaussian distribution centered at zero degrees with a 
standard deviation of 1.5°. No modification greater in magnitude 
than five degrees is permitted. Five residues on each side of the 
perturbed region are free to move during the perturbation to 
prevent the dihedral angle changes from causing long-range 
structural effects. These residues are mutated to glycine and an 
energy minimization with strong restraints on the perturbed 
dihedral angles is carried out to make the perturbed structure. All 
other residues are fixed in place during the process. Different 
backbone conformations are sampled by iteratively perturbing 
small regions of the backbone that are randomly chosen during 
each cycle throughout the variable domains. 

Optimal rotamer selection. The following step of IPRO 
involves the use of a rotamer library and MILP optimization to 
repack the amino acid side chains in and around the perturbed 
region. The perturbed residues that are mutated to glycine are 
always repacked. The residues that are spatially close to the 
perturbed region are also repacked. Only design positions within 
the perturbed region are permitted to mutate. The permitted kinds 
of amino acid mutations for each design position are specified 
during the sequence design stage. We use the abbreviations R and 
NR to refer to antibody designed residues (side-chains) and 
antigen/conserved antibody parts, respectively. 

The R-NR (i.e. the antigen and all parts of the antibody that 
are not being replaced by rotamers) and R-R interaction energies 
are calculated using the pairwise additive, non-bonded energy 
terms from the force field (i.e. van der Waals, electrostatics, and 
implicit solvation). Once all of the R-NR and R-R interaction 
energies have been calculated, MILP optimization is used to select 



the minimum energy arrangement of rotamers. The rotamer 
selection MILP has been previously discussed [26]. 

Antigen redocking. The fourth step of an IPRO iteration is a 
local, rigid-body redocking of the antigens. Since docking may 
take a long time for some antigens, this step is only carried out 
every third iteration. Docking uses the same pairwise additive 
energy functions used during the rotamer selection step. During 
docking, an antigen is randomly perturbed along and around the 
X, Y, and Z axes. The perturbations are generated using a 
Gaussian distribution centered at zero, with user-defined standard 
deviations (defaults of 0.2 A and 2.0°). After the antigen is 
perturbed, the net IE is evaluated and the movement of the 
antigen is kept or rejected based on the Metropolis criterion. Each 
antigen is sequentially randomly perturbed during an iteration of 
docking, and a user-defined number of iterations are carried out 
(default of 500 iterations). 

Energy evaluation. The fifth step involves a total energy 
minimization and complex and interaction energies evaluation. A 
high-resolution score function that evaluates van der Waals, 
electrostatics, bonds, angles, dihedral angles, improper dihedral 
angles, and generalized Born with molecular volume integration 
implicit solvation energy functions from CHARMM is used. The 
complex and interaction energies are used in the Metropolis 
criterion to determine whether or not to retain the results of the 
IPRO iteration. The user may set the temperature used to make 
the decision (default is to retain 25% of designs within 10 kcal/mol 
of the best design). 

Benchmark test set 

For benchmarking our antigen positioning and MILP selection 
algorithms, we curated a non-redundant set of bound antibody- 
antigen complex structures gathered from IMGT [40]. Identical 
antigens in different binding modes with antibodies were also 
included. This set includes 120 antibody-antigen complex crystal 
structures (Table S2) and at present, we only focus on the antigens 
that are either peptides or small proteins. Amino acid lengths for 
the antigens range from 4 to 148. This set is used to characterize 
the distribution of antigen epitopes, verify the initial antigen 
positioning and adjust the energy function. To test the IPRO 
design algorithm, we chose two germline (GL) / affinity matured 
(AM) pairs, all with known X-ray structures. Among them, one 
antibody pair is broadly neutralizing influenza virus antibody 
CH65 (AM) [45] and its putative GL precursor [16], and another 
is broadly neutralizing HIV-1 gpl20 antibody VRC01 (AM) [46] 
and its GL precursor [17]. The influenza HA1 and HIV-1 gpl20 
antigens were both modeled into the binding sites of their 
corresponding GL antibodies referencing their positions in the AM 
complex structures. For testing the proposed HScore, human, 
mouse, rat and rabbit antibody sequences were collected from 
Abysis [43] with the number of residues between 90 and 1 30. 

Implementation 

OptMAVEn is primarily written in Python and C++ and is 
available for download on our website, http://maranas.che.psu. 
edu. The most computationally demanding modules such as 
energy calculation, antigen positioning and humanness scores 
calculation are implemented in C++. Parts of OprMAVEn such as 
energy calculation between antigens and the entire MAPs and 
computational affinity maturation may use several processors at a 
time by sharing files. Using its default parameters and 10 
processors as an example, a design run against an antigen with 
100 amino acids is expected to take no more than three weeks. 
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Results 

Benchmark test of the initial antigen positioning 

The starting point for an OptMAVEn design is the positioning 
of the target antigen in a predefined antibody-binding site in order 
to obtain the initial antigen conformations. To evaluate whether 
the positioning protocol is capable of generating near-native poses, 
we tested it against a benchmark set (Table S2) that contains 120 
antigens varying in both length and conformations. As described 
in the Methods and shown in Fig. 2, each antigen was first rotated 
to make the epitope have the most negative Z coordinates and 
then positioned into the binding site box with its mean epitope 
coordinate placed in every grid point with rotations around Z axis. 
We used Root Mean Squared Deviation (RMSD) of backbone 
atoms between an antigen position and its corresponding native 
conformation as a metric for evaluating the quality of positioning. 
The quality of a positioning is classified as high (RMSD < 1 .0 A), 
medium (1.0 A<RMSD<2.0 A), and acceptable (2.0 A< 
RMSD<4.0 A) according to CAPRI-defined criteria for protein 
docking [11]. A successful prediction was described as a 
positioning run in which at least one of the ensemble members 
of initial positions has an acceptable RMSD (i.e.<4.0 A). The 
benchmark results of antigen positioning are summarized in 
Table 1. Across the entire benchmark set, OptMAVEn success- 
fully predicted at least acceptable or better quality solutions for 
115 targets, representing an overall success rate of 96%, which 
indicates OptMAVEn could sample the correct antigen orienta- 
tion and position without any a priori knowledge of antibody- 
binding site. With respect to antigen type, OptMAVEn success- 
fully predicted near-native structures for 99% of the peptide 
antigens and 92% of the protein antigens. The slightly better 
predictions of conformations of peptide antigens are most likely 
due to their relatively simple and linear epitopes. 

Four representative positioning successes of PDB 1MVU, 
1NQZ, 1DZB and 2FEL with RMSDs of 1.28, 2.27, 1.35 and 
2.74 A, respectively, are illustrated in Figure 3. As observed, the 
predicted positions of 1MVU and 1DZB are in close agreement 
with their corresponding native poses. For 1NQZ and 2FEL, the 
prediction quality is considered acceptable and further translations 
and rotations could improve the positions. The predictions for 
1 OBE and 1JRH serve as representative examples of positioning 
failures with RMSDs of 4.05 and 4.68, respectively. For lOBE, 
the predicted conformation of its L-shaped antigen adopted an 
almost opposite orientation compared to its native pose; for 1JRH, 
considerable differences are derived from rotations around the X 
and Y axes. In both cases, erroneous positioning mainly arise from 
the approximation of using coordinate centers of antigen epitopes 
for representing the entire epitopes, and further rigid-body 
rotations around X and Y axes are required for successful 
positioning. Extra rotation sampling around the X and Y axes will 
almost certainly give more accurate predictions, at the expense of 
longer computational time. 

Benchmark test for the MILP based residue design step 

In OptMAVEn, the variable domain of an antibody structure is 
initially predicted by assembling the six best-scoring MAPs (HV, 
HCDR3 and HJ for VH; KV, KCDR3, KJ or LV, LCDR3, LJ for 
VL) if both the VH and VL are being designed. For a given 
antigen conformation, the IE between this antigen and each MAPs 
is calculated using a pairwise energy functions involving both van 
der Waals and electrostatic terms (MILP energy). The MILP 
formulation described in Equations 1-6 is used to select the 
combination of MAPs structures with the lowest IE. To measure 
whether the MILP selection could identify native antibody parts 
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Figure 3. Examples of positioning successes and failures for 
peptide and protein binders. H and L chains are colored in 
cyan and green, respectively. Antigens are colored in yellow (native 
poses) and orange (best positioned poses). 
doi:1 0.1 371 /journal.pone.01 05954.g003 

efficiently, we tested it on the same benchmark set used to evaluate 
antigen positioning. Starting with only the native pose of each 
antigen, the IE to the best MAPs is compared to that to its native 
antibody part. The results are summarized in Table S3. For each 
complex, we also performed a further energy minimization and 
reevaluated the IE using the full CHARMM energy function. 

In the majority of cases, calculated MILP IEs were negative, 
indicating favorable binding between antibodies and antigens. In 
contrast, for five cases (1KTR, lOAZ, 1TZH, 2A6I and 2IFF) 
native antibodies were predicted to disfavor the antigen binding. 
This could arise from subtle steric clashes or over-approximated 
MILP energy functions. Therefore, we further minimized each 
complex and reevaluated the IEs using the full Charmm energy 
function, including the van der Waals, electrostatics, bonds, angles, 
dihedral angles, improper dihedral angles and generalized Born 
with molecular volume integration implicit solvation terms. Using 
this more detailed description for IE led to negative IEs for all 
antigen-antibody complexes. This suggests that with a relaxation 
and energy reevaluation step, it is possible for the MAPs structure 
selection MILP to design initial antibody models that successfully 
bind the antigen for favorable initial antigen positions (e.g. native 
positions). 

Of particular note is the fact that for 51/120 cases (i.e. 42.5%), 
the selected antibody models have better CHARMM IEs than the 



native antibodies, by an average of 95.2±90.7 kcal/mol. In the 
other 69 cases, the native complex is better by an average of 
9 1.8 ±64. 4 kcal/mol. It is not surprising that OptMAVEn 
identifies initial antibody models in some cases that have higher 
energy interaction scores than the native antibodies for two 
reasons. First, at any given moment the immune system will not 
have all possible naive antibodies available, so suboptimal 
antibodies can be expected to be chosen in many cases. In 
contrast, MILP selection will always be able to pick an optimal 
combination of MAPs structures. Second, the objective of an 
immune response is to eliminate a problem as rapidly as possible, 
not to design optimally binding antibodies. Although strong 
binding is important, other factors such as stability, concentration, 
and when an antibody is discovered play a role. Since OptMAVEn 
is solely trying to design optimal binders, it is to be expected that 
the designed antibody models would differ from the naturally 
occurring ones. 

Benchmark test of the IPRO design 

We next assessed the performance of the computational affinity 
maturation protocol in IPRO using two experimentally charac- 
terized GL and AM antibody pairs against influenza virus (AM 
antibody CH65) and HIV-1 (AM antibody VRC01), respectively. 
Structures for each pair of GL and AM structures were available 
[16,17,45,46]. Experimental data shows that the GL precursor of 
CH65 possesses 200-fold lower binding affinity to HA1 than the 
matured CH65 [16], and that VRC01 exhibits no detectable 
affinity for wild-type gp 1 20 [17]. Two design tests were performed. 
The first one considered mutations only among the residues that 
differ between GL and AM and a second where all residues could 
be mutated. The first test aimed to evaluate whether IPRO could 
redesign the antibody model to match the amino acid usage 
patterns in the AM pair starting from the GL sequence. The 
second test, without specified mutation positions, broadly models 
affinity maturations aiming to evaluate whether IPRO would 
independently discover results highly similar to natural affinity 
maturation. 

In the first test, forward and reverse designs were conducted for 
both systems. Forward design is denoted here as a design starting 
from a GL structure, while reverse design is the one starting from 
an AM structure. For each design, mutations along the entire 
sequence were specified according to their corresponding GL/AM 
partners. For each design, 25 independent IPRO trajectories were 
generated and the final IE results are the average of the 
trajectories, as listed in Table 2. As expected, the IEs indicate 
that the AM antibodies bind to antigens more tightly than the GL 
antibodies, which is in agreement with the free energy calculation 



Table 2. Forward and reverse designs using IPRO with specified positions for mutations. 





Antibody 


Structure 3 


Mutations b 




IE C 




IE difference" 






VH 


VL 


GL 


AM 




Influenza CH65 


4HK0 (GL) 


11 


6 


-294 


-444 


-150 




3SM5 (AM) 






-295 


-389 


-94 


HIV VRC01 


4JPK (GL) 


39 


25 


-167 


-258 


-90 




3NGB (AM) 






-272 


-384 


-111 



a The starting X-ray structures used for IPRO designs. GL in the parentheses indicate the structure is germline and AM is for affinity maturation. 
b The number of mutations identified by comparing GL and AM antibody sequences, 
interaction energy between an antigen and an antibody. All energies are in kcal/mol. 
d The IE difference calculated from AM - GL. 
doi:1 0.1 371 /journal.pone.01 05954.t002 
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Table 3. Forward designs using IPRO without specified positions for mutations. 



Antibody Structures* Mutations b Native mutations recovery' IE IE difference 6 

GL AM 

Influenza CH65 4HK0 17 6 -294 -1663 -1359 

HIVVRC01 4JPK 64 13 -167 -1247 -1080 

a The starting structures used for IPRO designs. GL in the parentheses indicate the structure is germline and AM is for affinity maturation. 

b The number of mutations identified by comparing GL and AM antibody sequences. 

c The numbers of designed mutations occurring both in GL and AM antibodies. 

interaction energy between an antigen and an antibody. All energies are in kcal/mol. 

e The IE difference calculated from AM - GL. 

doi:1 0.1 371 /journal.pone.01 05954.t003 



(AG) in Table S4. That this result could be recapitulated in 
forward and reverse designs for both systems demonstrates the 
robustness of the sampling and scoring function of IPRO. This is 
critical for the correct identification of a set of mutations that 
improve binding to an antigen. 

In the second test, only forward designs were performed because 
the aim is to assess the ability of IPRO to recapitulate the native 
mutations. The preferred amino acid types for FR residues were 
assigned according to site-specific amino acid probabilities 
obtained from the alignments of existing broadly neutralizing 
influenza and HIV antibodies, respectively (Data SI and S2). No 
mutation positions or preferred amino acid types for the CDRs 
were specified. As seen in Table 3, in the designed sequences 35% 
and 20% of mutations in the native AM influenza and HIV-1 
antibody models, respectively, are recaptured. These percentages 
are comparable to those from a protein interface design for Ran 
GTPase with 22-39% native sequence recovery using single- 
constraint design strategy and mutating only interface residues 
[47] . Our design is more challenging because the entire sequence 
space requires sampling, typically more than 200 positions for 
paired heavy and light variable domains. In addition, it has been 
observed that native somatic mutations evolved from the same GL 
antibody against the same antigen epitope can be quite diverse 
[43,44,48]. Among the ten VRCOl-like HIV broadly neutralizing 
antibodies isolated by RSC3 binding from different donors, only 
two residues from the same germline IGHV 1-2*02 allele mature to 
the same amino acids [48]. Therefore, for a given mutation 
position there exist multiple native mutations. In this test, we used 



only one AM antibody as the reference for defining the native 
mutations. The two reported percentages of native mutation 
recovery would be higher if multiple AM antibodies were used as 
native references. 

Benchmark test for the humanness score 

The fundamental assumption of our humanization protocol is 
that human sequences contain ensembles of local sequences that 
possess minimal immunogenicity due to a lack of binding to MHC 
and/ or recognition by T cells. Based on this assumption, HScore 
in OptMAVEn was developed to measure the 9-mer differences of 
all possible 9-mer frames in a test sequence by comparing them to 
a precompiled human 9-mer-sequence database. An immunolog- 
ical fragment with size of 9 is chosen because it is the basic peptide 
unit required for high affinity binding to MHC II [49] . Essentially, 
an antibody humanness score should have the ability to 
differentiate human antibody sequences from other species' 
antibody sequences (e.g. mouse) with high specificity. To meet 
this end, analysis of the human, mouse, rat and rabbit antibody 
sequences was performed according to the HScore definition and 
the results are summarized in Table 4. According to its definition, 
a lower HScore indicates sequences that are more homologous to 
human antibody sequences and are predicted to have lower 
immunogenicity than a higher HScore and vice versa. 

Overall, there was a marked difference in HScores between the 
human and mouse sequences with p<0.000001. Even better 
statistical significances were observed for the rat and rabbit 
antibodies, but the number of sequences evaluated was many 



Table 4. Calculated HScores of human, mouse, rat and rabbit sequences. 



Species Region Number of sequences Mean 9 SD (±) b 



Human 


VH 


3000 


0 


0 




VL 


3000 


0 


0 


Synthetic 


VH 


720 


40 


41 




VL 


599 


21 


33 


Mouse 


VH 


3462 


54 


29 




VL 


3480 


34 


27 


Rat 


VH 


1062 


57 


42 




VL 


173 


99 


49 


Rabbit 


VH 


244 


149 


18 




VL 


244 


109 


19 



a The mean of HScore. 

b The standard deviation of HScore. 

doi:1 0.1 371 /journal.pone.01 05954.t004 
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fewer than for mouse. It appears that synthetic sequences are more 
similar to those of human than mouse, rat and rabbit. For 
example, synthetic VL has a mean lowest mean HScore of 21. The 
lower HScore of synthetic antibodies are consistent with their 
sequence contents, which are often created through joining two 
genes, one of which is human. Conversely, rabbit VH has the 
highest mean HScore of 149, indicating it has the most different 
local 9-mer sequences to human VL sequences. Overall, our 
results demonstrate that HScore is an efficient humanization score 
for evaluating the predicted immunogenicity of a target antibody 
sequence by correctiy differentiating human sequences from other 
mammal. 

Two case studies: Design of models against influenza 
hemagglutinin and HIV gp120 

We showed that OptMAVEn could accurately position antigens 
in the predefined antibody-binding site, select appropriate 
antibody parts from MAPs and identify interaction energy 
improving mutations while distinguishing human from other 
mammal sequence. OptMAVEn is carried out by combining 
antigen positioning, MILP based antibody model assembly and 
IPRO based in silica affinity maturation as a complete pipeline. 
For demonstration we applied OptMAVEn to design broadly 
neutralizing antibody models that target HA and gpl 20, which are 
well-know antigens for the influenza [45,50-52] and HIV-1 
[46,53-55] viruses, respectively. 

Identification of epitope 

HA proteins, which attach the influenza virus to sialic acid 
receptors on the cell surface, are a prime drug target. The majority 
of human antibodies are directed against sites on the head of HA1, 
in particular the receptor-binding site, to prevent the attachment 
of virus to target cells. Other antibodies target the stem region to 
prevent membrane fusion. In this study, we target the receptor- 
binding site of HA1, the epitope in PDB 3SM5. The B chain and 
some residues of chain A far from the binding site were ignored to 
reduce computation time. The receptor-binding site is a broad and 
shallow pocket framed by three loops and one helix forming the 
outer ridges, illustrated as the 130-loop (magenta), the 150-loop 
(yellow), the 190-helix (blue) and the 220-loop (red) in Fig. S3 A. 
The second study involves designing antibody models against the 
gp 1 20 protein of the human immunodeficiency virus type 1 (HIV- 
1). Gpl 20 is a glycoprotein exposed on the surface of the HIV 
envelope vital for virus entry into a cell. A series of broadly anti- 
HIV-1 neutralizing antibodies bind to sites located in variable 
regions of gp 1 20 and help prevent HIV infection. In this study, we 
designed antibody models targeting the CD4-binding site of gpl 20 
[56]. The gpl 20 structure was obtained from PDB 3NGB and the 
epitopes residues are located in CD4-binding-loop, V5-loop and 
D-loop (Fig. S3B). For each antigen, we chose two targeted 
epitopes: one includes all the residues in the receptor-binding site 
(named as HA-all and gpl20-all, respectively) and another 
includes residues that form a loop in the receptor-binding site 
(i.e., 130-loop for HA1 (HA-130) and CD4-binding-loop for gpl20 
(gpl 20-365)). 

Assembly of antibody models from MAPs 

Table 5 reports the selected best MAPs structures for each 
epitope before and after a local conformation refinement of the 
antigen. The negative MILP energies in each case suggest the 
selected MAPs structures favor antigen binding. Significandy 
reduced MILP energies were observed after a refinement of the 
local antigen conformation and a reselection of the best MAPs 
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V CDR3 J V CDR3 J 




V CDR3 J V CDR3 J 




Figure 4. Designed antibody model. (A)-(D) Model structures for epitopes of HA-all, HA-130, gp1 20-all and gp1 20-365 before (in yellow) and 
after (in orange) refinements. (E)-(H) MILP reselected best scored MAPs after refinements. H and L chains are colored in cyan and green, respectively. 
V, CDR3 and J represent corresponding MAPs (See definition in Method). 
doi:1 0.1 371 /journal.pone.01 05954.g004 



structures. Note that the refinement step was only performed for 
the antigen position with the best-scored MAPs structures and 
aims to provide a further local conformational sampling around 
the best conformation from the initial screen. Comparing the 
antigen conformations before and after refinement (Fig 4A-D), 
with all RMSDs higher than 2 A, confirms that local antigen 
refinement had a pronounced effect on the antigen conformations 
and the selection of the best-scored MAPs structures. 

Figure 4E-H show the selected MAPs structures. It can be seen 
that the CDR3s in both VH and VL exhibit considerable diversity, 
in accordance with their critical roles in recognizing different 
epitopes. In addition, the V parts exhibit some variability as well, 
especially in the CDRs. By contrast, there is less variability in the 
selected J parts. The selection results from these four designs 
indicate that the MILP could identify promising initial antigen 
positions. Once the best low energy conformations are identified, 
an additional refinement step is required to explore the local 
antigen positions for obtaining the final modular parts. 

In silico affinity maturation and humanization 

To increase the relevance of the identified designs, the 
permitted amino acid mutations at each FR position were pre- 
selected according to the amino acid frequency of each kind of 
amino acid at that position in alignments of broadly neutralizing 
HIV and influenza antibodies (Data SI and S2). However, the 
residues in CDRs were allowed to mutate into any of the 20 
standard amino acids. Figure 5 depicts the predicted lowest-energy 
amino acid sequences and corresponding alignments between the 
initial and designed sequences. The sequence distribution of 
mutations involved in affinity maturation is very diverse, with no 
clearly preferred amino acids/positions. 

The HA-all and HA-130 antibody models have 10/9 and 11/19 
mutations in VH/VL, respectively (Table 6). In contrast, gpl 20- 



all and gpl 20-365 antibody models have 12/7 and 11/6 
mutations in the corresponding domains. Antibodies typically 
accumulate 5-20% changes in somatic mutations, but in the case 
of HIV- 1 -neutralizing antibodies, the level of somatic mutation 
ranges from 15% to 44% [57]. Our results suggest that the 
mutation rate from computational affinity maturation is compa- 
rable to that of somatic mutations evolved in vivo. 

For all four designs, the IPRO results show significant energy 
improvements both in complex and interaction energies, alluding 
that the designed antibody models are much more stable and 
could bind to the antigens more tightly (Table 6). Figure 6 shows 
the amino acid compositions before and after affinity maturation. 
Interestingly, Glu, Gly, His, Met, Arg and Ser occur more than 
other amino acids (counts > = 3) in designed sequences, whereas 
Leu, Pro and Tyr occur less frequently (counts < = 3). Table 7 
summarizes the changes of usage during affinity maturation. An 
apparent trend is that charged residues are favorable in the affinity 
mature sequences while aromatic residues are disfavored. Despite 
the importance of aromatic residues in the binding to antigens 
[31], this finding is in agreement with previous studies that 
tyrosine and tryptophan are abundant in the germline genes and 
the degree of aromaticity is typically reduced during affinity 
maturation [58]. Meanwhile, the number of polar residues is 
slightly increased. More polar and charged residue occurrences 
contribute to the improvement of binding affinity and complex 
stability in the solvent. 

Table 6 lists the humanness score for the initial and designed 
antibody model sequences. Most of the initial sequences, except 
for the HA-130 light chain and gpl 20-365 heavy chain possess 
low humanness scores, indicating they are already human-like 
sequences and probably have low immunogenicity. The predicted 
immunogenicity contributions of designed sequences are primarily 
from CDR3 because the V* and J* model structures in the MAPs 
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HA-all 



|| RLQLQESGPGLVKPSETLSLTCTVSGGSISSSSYYWGWIRQPPGKGLEWIGSIYYSGSTXYNPSLKSRVTISVDTSKNQFPLKLSSVTaADTAVYYCARIDWITSHEDDFGDYYTGEYYGLDSWGKGTTVTVSS 
EVQLQESGAGLVKPSETVSLTCTVSGGSISSSSYYWGWIRQSPGKGLEWIGSIYYSGSTNYNSSLKSRVMISVDTSKNQFPLKLSSVTAGDTAVYYCARIDWITSHEDDFGDYYTGEYYGLDSWGKGTTVTVTS 



L E I VLTOSPATLSLS PGERATLSCRASOGVSSNLAVJYOOKPGOAPRLL I YDASNRATGI PARFSGSGPGTDFTLTI SSLE PEDFAVY YCOOYHSYPPTFGPGTKVD I K 
EIVMTQSPATLSGSPGERATLSCRASQGVSSNLAWYQQKRGQAPRLLIYDASKRATGIPARFSGSGPGTDFTLTISRMEPEDFAVYYCQQYHSYPPTFGGGTKVEVK 



HA-130 



H QVQLVESGGGVVOPGGSLRLSCAASGFTFSS YGMHVA/ROAPGKGLEV7\'AF I RYDGSNKY YADSVKGRFT I S RDNSKNTLYLQMNSLRAEDTAVY YCTRSDGRNDHDSVIGOGTLVTVSS 
QVRLVESGGGWQPGGSMRLSCAASGFTFSQSGMHHVRQAPGKGLEHVAFIQSDGSHKYYADSVKGRFZISRDDSKHTLYLQMHSLRSEDTAVYYCTRSDGRNDMDSWGQGTTVTVTS 



L D I VMTOTPLSLPVTPGEPAS I SCRSSESLLDTDE YTYLNVJYLOKPGOS PQLLI YEVSNRASGVPDRFSGSGSGTDFTLK ISRVEAEDVGVY YCLGGYPAASYRTAFGOGTKVE IK 
DTVMTQSPLSLPVTPGEAASISCRS5QSLLHTDEKTYLHWYLQKPGQSPQLLIYEVSHRASGVPDRFSGGGSGIEFTLTISRVEAEDVGVYYCASGYNYDSDSGTFGQGTKLEIK 



gp!20-all 



|| EVQLVESGC-GLVOPGRSLRLSCAASGFTFDDYAMHWVRQAPGKGLEWVSGISWNSGSIGYADSVKGRFTISRDNAKNSLYLQMNSLRAEDTALYYCAGVYEGEADEGEYDNNGFLKHWGQGTLVTVSS 
EVRLVESGGGLVOPGKSLRLSCTASGFTFKDAAMHWVROAPGKGLEV.^'SGI SSNSGS I GYADSVRGRFTI SRDNAKRSLYLOMNSLKSEDTAVY YCAGVYEGEADEGE YDNNGFLKHWGQGTLVTVTS 



L SYELTQPPSVSVSPGQTARITCSGDALPKKYAYWYQQKSGQAPVLVIYKDSKRPSGIPERFSGSSSGTMATLTISGAQVEDEDDYYCQAWDSSTAWFGTGTKVTVL 
SHELTOPPSVSASPGQTARITCSGDALPKKYAYVJYQQKAGOAPVLVI YKDRERPSGI PERFSGSSSGTt-lATLTI SGAQVEDEDDY YCQAVJDSSTAWFGGGTKLTVL 



gp 120-365 



EVQLVQSGAEVKKPGATVKISCKVSGYTFTDYYMHWVQQAPGKGLEWMGLVDPEDGETIYAEKFQGRVTITADTSTDTAYMELSSLRSEDTAVYYCARRQLGLPWFAYWGQGTLVTVSS 
EVQLQQSGAEVKKPGESVKVSCKASGYTFTDYYMHWVQQAPGKGLEWMGLVDPADGETIYAEKFQGRVTITADTSASTAYMELRGLRSEDTAVYYCARRQLGLPWFAYWGQGTTVTVSS 



FR1 CDR1 FR2 CDR2 FR3 CDR3 FR4 

L SYVLTQPPSVSVAPGQTARITCGGNNIGSKSVHWYQQKPGQAPVLWYDDSDRPSGIPERFSGSNSGNTATLTISRVEAGDEADYYCAAWDDSLNGPWVFGSGTKVTVL 
SSVLTQPPSVSVALGQTARISCGGNNIGSKSVHWYQQKPGQAPVLWYDDSDRPSGIPERFSGSNSGNTASLTISRVEAGDEADYYCAAWDDSLNGHWVFGGGTKVTVL 

Figure 5. Alignments between designed and initial antibody model sequences for epitopes of HA-all, HA-130, gp120-all and 
gp1 20-365. FRs, CDRs regions and the lengths of sequences are indicated on top of each alignment. Yellow shading shows introduced amino acid 
mutations. 

doi:1 0.1 371 /journal.pone.01 05954.g005 



database are based on human genes whereas CDR3 has some 
mouse genes to ensure the maximum amount of structural 
diversity [25] . After the design, as expected, the humanness scores 
are decreased or maintained at a similar level. For example, the 
HScore of gp 120-365 light chain is reduced to 0 from 4 and that 
of HA-130 heavy chain is maintained at 0, both predicting no 
immunogenicity in the final designs. However, it is also noted that 
the light chain of HA-130 still has a HScore of 32, comparable to a 
mouse or rat sequence. This shows that upon design the highest 
binding affinity design may not have low immunogenicity. 

The two generated designs against HA1 exhibit highly diverse 
amino acid sequences, and antigen locations/ orientations, but all 
share the receptor-binding site (Fig. 7 A and 7B). Interestingly, the 
HA-all antibody model possesses a long HCDR3 loop composed 
of 26 amino acids, which does not bind directly to the epitope 
residues, but interacts with non-epitope loops on the right side of 
HA1, as shown in Figure 7A. Such unusually long CDR3s [59] are 
rarely found in existing HA1 antibodies, and provide a larger 
antigen-binding surface, potentially introducing more favorable 
interactions. Moreover, both HCDR1 and HCDR2 insert into the 
receptor site and interact directly with the 130-loop with four 
hydrogen bonds (Fig. 7E). In receptor analog LSTc bound to 1934 
HA (PDB 1 RVZ), the carboxylate group of sialic acid forms two 
hydrogen bonds with backbone amide of HA1 Alal37, as does the 
side-chain hydroxyl oxygen of HCDR2 Tyr58 with the backbone 



amide of HA1 Ala80. The amide of the acetamido group has a 
hydrogen bond with the backbone oxygen of HCDR1 Vail 35, as 
does the backbone oxygen of HA1 with HCDR1 Ser35 (Fig. 7G). 
In addition, the side-chain carboxyl oxygen of HCDR2 Asp59 of 
HA-130 antibody model forms the same hydrogen bond with the 
backbone amide of HA1 Ala80 (Fig. 7F). Thus, the designed HA- 
all and HA-130 antibody models partially mimic the human 
receptor that interacts with HA1 via insertion of HCDR1 and 
HCDR2 into the receptor-binding site. This mode of receptor 
mimicry has also been observed in related broadly neutralizing 
antibodies CH65 [45] and 5J8 [60], which use HCDR3 insertion, 
and SI 39/1, which uses HCDR2 insertion. Significandy, the 
algorithm automatically generated these results with the only user 
input being the identification of the epitope as a whole. The 
recapitulation of this trend suggests that OptMAVEn could 
reproduce expected antigen binding motifs. The newly designed 
antibody models are expected to broadly neutralize a large 
number of strains from a single HA or selected strains from 
different subtypes and groups of influenza because the epitope in 
the receptor-bing site is relatively conserved [61]. 

Both of the designed antibody models against gpl20 are 
predicted to interact directly with the targeted CD4-binding loop. 
However, they adopt quite different binding modes (Fig. 7C and 
7D). The gpl20-all antibody model uses the residues where 
LCDR2 attaches to grasp virtually all surface-exposed portions of 
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the CD4-binding loop. One hydrogen bond is formed: the side- 
chain guanidinium nitrogen of Arg67 with the backbone oxygen of 
gpl20 Gh/228 (Fig 71). By contrast, the gpl20-365 antibody 
model uses HCDR3 to interact with CD4-binding loop, which 
involves three hydrogen bonds: the amide nitrogen of HCDR3 
Gin 108 with the backbone oxygen of gpl20 Ser365; the backbone 
oxygen of Glnl08 with the backbone nitrogen of gpl20 Gly367; 
the backbone nitrogen of HCDR3 GlyllO with the carboxyl 
oxygen of gpl20 Asp368 (Fig. 7J). Among CD4-binding site- 
directed antibodies, most antibodies (such as the VRC01 class) 
structurally mimic the CD4 receptor (Fig. 7H, PDB 3JWD) by 
substantial P-strand contacts to the epitope using HCDR2 [56]. 
The HCDR3-dominated mode of interaction adopted by the 
gp 120-365 antibody model is similar to a recent isolated broadly 
neutralizing HIV-1 antibody CHI 03 [33], which is less mutated 
than most other CD4-binding site antibodies and reveals a new 
loop-based mechanism of antibody neutralization. 

Summary and Discussion 

Previously, we have developed OptCDR for the de novo design 
of antibody binding pockets composed of CDRs against any 
specified antigen. We used OptCDR to generate a library of 
antibody CDRs models against the FLAG antigen (i.e. DYKD) 
[62] and a preliminary analysis of biological experiments has 
shown multiple, unique antibody bindings (unpublished data). 
Here, by expanding the idea of OptCDR, we presented an 
efficient and general method for de novo design of the 
computational models of the entire antibody variable domain 
targeting a specific antigen epitope. Given an antigen, the method 
entails positioning the antigen in a virtual antibody-binding site, 
identifying the best modular parts from the MAPs database based 
on the positioned antigen poses, de novo assembly of these MAPs 
into the initial antibody model structure, and designing mutations 
in the antibody model structure to simultaneously improve binding 
affinity and reduce immunogenicity. For both HA and gpl20, 
novel antibody models with numerous interactions with their 
target epitopes were generated. The observed rates of mutations 
and types of amino acid changes are consistent with what has been 
seen during in vivo affinity maturation. It is especially encouraging 
that these features were not controlled for in OptMAVEn but were 
automatically generated. Additionally, in the case of the HA 
antibody models, a method of antigen binding that has been 
previously observed in broadly neutralizing anti-influenza anti- 
bodies was discovered in both designs. 

In OptMAVEn, a pre-generated conformer library was adopted 
to represent the possible antigen binding poses. Despite success in 
most of the 120 benchmark cases (success rate of 96%), in some 
cases, the antigen positioning algorithm still could not sample any 
conformation having a RMSD within 4 A of the native. This 
limitation may have largely been due to the insufficient rotational 
sampling of antigens around the X and Y axes. Although this 
limitation could be overcome by adding further sampling around 
both axes, caution should be taken. For each antigen conforma- 
tion, its IE to each modular antibody part needs to be pre- 
calculated and this computation could quickly become intractable 
with a significantly increased number of antigen conformations. 
Therefore, it is suggested to choose an appropriate size of the 
conformer library by adjusting the parameters of antigen 
positioning if further sampling is required. 

OptMAVEn generates a prototype antibody model that targets 
an antigen by assembling the six best-scored MAPs structures. 
This idea is inspired by the natural evolution of an antibody in 
vivo, in that the gene of a germline antibody is initially assembled 
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Figure 6. Counts of the type of amino acid mutations before (blue) and after (red) computational affinity maturation from the four 
best designed antibody models for epitopes of HA-all, HA-130, gp120-all, and gpl 20-365. 

doi:1 0.1 371 /journal.pone.01 05954.g006 



by V-(D)-J recombination. Note that the three differing MAPs 
regions do not match exactly with their V-(D)-J genes because 
MAPs utilizes CDR3 as a structural component representing the 
whole diversity region. The primary advantage of using MAPs to 
assemble an antibody model is that the prediction is fast and there 
is no need for de novo folding calculations. The computational 
savings do not come at the expense of accuracy of prediction, as 
the RMSD of the predicted structures is at least as accurate as 
earlier methods [25]. In addition, compared to traditional 
fragment-assembly-based approaches for de novo protein structure 
prediction, this approach could efficiently sample both local and 
non-local contacts that are inherently present in the relatively large 
structural fragments. The designed sequences therefore should 
have a higher probability of proper folding. It should be also noted 
that MAPs database includes a considerable amount of informa- 
tion on pairwise clashes between parts. These clashes can thus be 
avoided by not selecting both members of the pair by the MILP 
problem in OptMAVEn. The aim of the MILP problem is to find 
the best combination of the six MAPs with the lowest IEs with the 
targeted antigen. In its current implementation, it does not 
explicitly account for the interactions between MAPs except for 
excluding clashing pairs. In our benchmark test, we found that the 
selected MAPs have better IEs than native antibodies in 42.5% of 
120 cases. While approximations in the energy function may 
account for some of the improved scores, the general trends allude 



that the cohort of designed sequences should bind more tighdy 
than the native ones. 

A simplification of the protocol in its described implementation 
is that it uses IE instead of binding free energy for evaluating the 
binding affinity of an antigen and antibody model. Although we 
found that IEs are in good agreement with experimental binding 
data in two GL-AM antibody pairs, this replacement is still at the 
expense of an overall bias in energy function, especially in systems 
where conformational entropy is crucial for the binding interac- 
tion. This is supported by the fact that we obtained large negative 
IEs (Table S3) that are not quantitatively comparable to true 
experimental binding energies for predicting the binding affinity of 
native antigens and antibodies. This may have been due to a lack 
of entropy term in the formulation of IE (Table S4). Effective 
entropy could quantify the variability of sequence and their 
rotamer states that are consistent with a target structure and other 
properties specified as constraints on the properties of the 
sequences [63] . However, the role of side-chain entropy plays in 
protein design is still unclear. For example, both Kendra [17] and 
Daniele el al. [64] demonstrate that changes in protein confor- 
mational entropy can contribute significantly to protein sequence 
design. Conversely, Hu et al. suggest that side-chain conforma- 
tional entropy has a relatively small role in determining the 
preferred amino acid at each residue position in a protein except 
for longer amino acids: methionine and arginine [17]. Neverthe- 



Table 7. Statistics of amino acid propensities from the four best designed antibody models for epitopes of HA-all, HA-130, gpl 20- 
all and gp120-365. 





Stage 3 


Aliphatic" 


Aromatic b 


Polar' 


Charged 0 


Before 


35 
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28 


15 


After 


29 
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31 


24 



a Before or after the design. 

b The counts of amino acids belonging to this classification. 
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Figure 7. Structures and binding modes of designed antibody models for epitopes of HA-all, HA-130, gp120-all, and gp1 20-365. H 

and L chains are colored in cyan and green, respectively. Antigens are colored in orange. Hydrogen bonds are highlighted in dashed line and colored 
in magenta. (A)-(D) Overall complex structures. (E) and (F) Antibody models that recognize 130-loop in the receptor-binding site of HA1. (G) 
Interaction of receptor analog LSTc in the receptor-binding site of HA1. (H) Interaction of CD4 and CD4-binding loop of gp120. (I) and (J) Antibody 
models that recognize of CD4-binding loop of gp120. 
doi:1 0.1 371 /journal. pone.0105954.g007 



less, future work will consider introducing an appropriate entropy 
model into the energy evaluation. 

Another concern in the energy function is the employed 
Lazaridis-Karplus implicit solvation model [65] which does not 
adequately capture charged residue interactions with the solvent. 
This is reflected by some test designs where charged residues were 
continually selected without solvent exposure (data not shown). 
This is a limitation of current implicit solvation models, which 
yield higher or similar water-to-protein transfer free energies for 
nonpolar as for many of the polar residues. As a result, they 
improperly favor the burial of polar amino acids in the protein 
interior over nonpolar ones [66]. 

During design runs, we also found that the energy calculations 
often end up with a local energy minimum instead of the global 
energy minimum due to the incomplete sampling of the complex 
energy landscape given the allotted cpu time. This could lead to 
inconsistent results between different repetitions of the same 
computational design run. To address this issue, we introduced 
ensemble structure refinement, which is a refinement carried out 
by generating an ensemble of structures (default of 1 0) from the 



"refinement" iterations (default of 25) of IPRO. A structural 
refinement has the same steps as a normal IPRO iteration except 
no mutations are allowed. After all refinement iterations have been 
completed, the average properties of the refinement ensemble are 
evaluated to determine whether or not a particular design is 
actually the best identified so far. This ensemble approach to 
evaluating structures has given a high correlation (R 2 = 0.960) 
between calculated IEs and experimentally measured binding 
affinities in previous work [17]. 

Another challenge for die de novo design is the uncertainty 
associated with the employed scoring function [63]. For example, 
energy functions are approximate, side-chain conformations are 
treated discretely and solvation is represented using simplified 
models. One solution is to apply site-specific amino acid probabil- 
ities, which takes advantage of known inter-atomic interactions and 
structural features to yield sequence information consistent with 
naturally folded structures and desired functional properties. It is well 
known that highly conserved residues are generally distributed in the 
FRs, whereas the residues in CDRs tend to be considerably diverse. 
Therefore, in OptMAVEn, we applied two different strategies to 
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determine the permitted amino acid kinds to mutate for residues 
both from FRs and CDRs. The permitted kinds for FRs residues 
were site-specific and the probability in each position was obtained 
from the sequence alignment of broadly neutralizing HIV and 
influenza antibodies (Data S 1 and S2), while those for CDRs residues 
had no any constraint at all. In both case studies, multiple solutions 
with quite different sequences and binding modes were identified. 
Improved mutations were found in both the FRs and the CDRs. The 
designed antibody models have an apparent trend of obtaining more 
polar and charged residues while disfavoring aromatic residues. 

Designed antibody models could be immunogenic and cause an 
unexpected and serious immune reaction if applied. To address 
this issue, we introduced an HScore to evaluate the immunoge- 
nicity of designed antibody models in silico. The aim is to limit the 
sequence design to mutations that either maintain or increase the 
human-like sequence content. Our algorithm is inspired by a very 
similar humanization method called HSC [23], which is calculated 
by determining the maximal identity between a string of test 
sequence and any sequence in an aligned set of human sequences, 
and summing theses values over all pertinent sequence positions. 
Although HScore works well to distinguish human antibody 
sequences from other species, it should be noted that OptMAVEn 
does not guarantee the design of antibody models without any 
immunogenicity (HScore = 0) under the current design parame- 
ters. OptMAVEn seeks to design antibody variant models 
simultaneously having reduced immunogenicity and improved 
binding in silico, and these two properties are often in inverse 
proportion (i.e. high affinity binders may possess high immuno- 
genicity and vice versa). This is supported by our case studies, 
where the designed light chain of HA- 130 still has an HScore of 
32. This is comparable to what would be expected in a mouse 
antibody. Also, fewer mutations were found to occur in CDR3, 
where a considerable amount of energetically favorable mutations 
were observed in naturally occurring AM antibodies. Parker et al. 
demonstrated a similar difficulty in optimizing both the stability 
and immunogenicity of therapeutic proteins by a structure-guided 
deimmunization strategy [28]. Another probable reason for the 
failure of deimmunization is that the maximum number of allowed 
residues (3 under the current study) to mutate and the site-specific 
amino acid probabilities both impose limits on the number of 
sequences for immunogenicity evaluation. During some IPRO 
iterations, few or no mutations are presented for further energy 
optimization. As a result, in general trade-offs must be made to 
design antibody models either with predicted moderate binding 
affinity and lowest immunogenicity or highest binding affinity and 
moderate immunogenicity. 
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